Skip to main content
Version: 2024.05-cpu

active learning

"Machine learning force fields (MLFF), compared to traditional methods, can predict material properties and reaction mechanisms faster and more accurately. The current state-of-the-art deep learning-based molecular dynamics can simulate systems with billions of atoms. However, due to the interpolation nature of machine learning methods, MLFFs struggle to make accurate predictions in the phase space outside the training set. Since training data is usually generated using expensive first-principles calculations, it is challenging to obtain a large amount of ab initio data that is both representative and independent of extensive calculations. Improving the extrapolation capability of MLFF models without relying on a large amount of ab initio data is crucial. PWact (Active learning based on PWMAT Machine Learning Force Field) is an open-source automated active learning platform based on PWMLFF, designed for efficient data sampling."

AL-PWMLFF

The AL-PWMLFF platform consists of two main components: the main task and the task scheduler, as shown in the architecture diagram.

Arch_design_slurm

The main task includes two modules, preparing pre-training data (init_bulk) and active learning (sampling). It is responsible for generating computational tasks and collecting results during the preparation of pre-training data and the active learning process. The task scheduler receives task scheduling requests and assigns tasks to the corresponding computing nodes based on the resource utilization and task resource requirements. After the tasks are executed, the task scheduler collects the execution results from the computing nodes and returns them to the main task program.

pre-training data preparation module

Includes four sub-modules: relaxation (supporting PWMAT, VASP, CP2K, and DFTB), supercell generation, lattice scaling, lattice perturbation, and running MD (supporting PWMAT, VASP, CP2K, and DFTB). It also supports combinations of these modules.

active learning module

active_sampling

The active learning module consists of three sub-modules: training, configuration exploration, and annotation (supporting PWMAT, VASP, CP2K, and DFTB). First, the training module performs model training. Then, the trained model is passed to the exploration module, which uses the force field model for molecular dynamics simulations. After the simulation, the molecular motion trajectory is passed to the query module for uncertainty measurement. Once the query is completed, the annotated configuration points are sent to the annotation module. Finally, the annotation module performs self-consistent calculations to obtain energy and forces, which are used as labels along with the corresponding configurations in the annotated database. This process is repeated until convergence.

  1. For model training, PWMLFF supports DP model, DP model with compress, DP model with type embedding and NEP(NEP4) model.

  2. For uncertainty measurement, common methods based on multiple-model committee queries are provided, as well as our latest design, the single-model Kalman Prediction Uncertainty (KPU) based on Kalman filtering. This method can reduce the computational cost of model training to 1/N, where N is the number of models in the committee query, while achieving accuracy close to the committee query. Users are welcome to try this method. For the KPU method, it is only applicable to the DP model.

  3. For annotation, PWMAT or VASP is supported.

The pre-training data preparation module

It includes four sub-modules: relaxation (supporting PWMAT or VASP), supercell generation, lattice scaling, lattice perturbation, and running MD (supporting DFTB, PWMAT, or VASP). It also supports combinations of these modules.

Dependencies

  1. AL-PWMLFF job scheduling uses the SLURM cluster management and job scheduling system. SLURM must be installed on your computing cluster.

  2. DFT calculations in AL-PWMLFF support PWmat, VASP, CP2K and DFTB. We have integrated DFTB in PWmat. You can find detailed usage instructions in the DFTB_DETAIL section of the PWmat Manual.

  3. AL-PWMLFF model training is based on PWMLFF. Refer to the PWMLFF documentation for installation instructions (Download address for PWmat version integrated with DFTB).

  4. AL-PWMLFF Lammps molecular dynamics simulation is based on Lammps_for_pwmlff. Refer to the Lammps_for_pwmlff documentation for installation instructions.

Installation Guide

PWact supports two installation methods: pip command installation and source code installation.

1. Pip Command Installation

Source Code Download

git clone https://github.com/LonxunQuantum/PWact.git
or
git clone https://gitee.com/pfsuo/pwact.git
The Gitee repository may not be updated as promptly as GitHub, so it is recommended to download from GitHub.

After downloading the source code, navigate to the root directory (at the same level as setup.py) and run the following command:

pip install .
# Or use the developer option, which installs without copying files. It reads directly from the source files, meaning any changes to the source code will take effect immediately. This is useful for users who need to modify the source code themselves.
# pip install -e .

PWact is developed in Python and supports Python 3.9 and above. It is recommended to use the Python runtime environment of PWMLFF directly.

If you need to create a separate virtual environment for PWact, you only need to install the following dependencies (matching your Python version, supporting Python 3.9 and above).

pip install numpy pandas tqdm pwdata

Command List

PWact includes the following commands, starting with the command pwact.

1. Output a list of available commands

pwact  [ -h / --help / help ]

You can also use this command to check if PWact is installed successfully.

2. Output the parameter list for cmd_name

pwact cmd_name -h

3. Prepare initial training set

pwact init_bulk param.json resource.json

4. Active Learning

pwact run param.json resource.json

For the above two commands, the names of the json files can be changed by the user, but the order of input for param.json and resouce.json must remain the same.

5. Utility Commands

Convert MOVEMENT or OUTCAR to PWdata format

pwact to_pwdata

Search for labeled datasets in the active learning directory

pwact gather_pwdata

End the ongoing init_fulk tasks, such as relaxation and AIMD tasks.

pwact kill init_bulk

End running tasks that are currently in progress, including training, exploration (MD), or tagging tasks

pwact kill run

The kill command function above can also be replaced by manual operation. You need to first end the executing main process, that is, the window that executes pwact init_fulk or pwact run; The second step requires you to manually end the ongoing Slurm task.

Considering that manual operation may accidentally terminate your other processes, it is recommended that you use the command to terminate.

After using the command to end a process, it is recommended that you check the command output information and use the sludge command to see if there are any unfinished processes.

Input Files

AL-PWMLFF requires two input files, param.json and resource.json, for initial dataset preparation or active learning. AL-PWMLFF is not sensitive to the case input of keys in two JSON files.

param.json

Initial Training Set Preparation - init_param.json

Configurations (VASP, PWMAT format) for relaxation, supercell, scaling, perturbation, and AIMD (DFTB, PWMAT, VASP) settings.

Active Learning - run_param.jso

Training settings (network structure, optimizer), exploration settings (LAMMPS settings, sampling strategies), and labeling settings (VASP/PWMAT self-consistent calculation settings).

resource.json

Settings for computational cluster resources, including computing nodes, CPU, GPU resources for training, molecular dynamics (MD), DFT calculations (SCF, Relax, AIMD), and corresponding software (LAMMPS, VASP, PWMAT, PWMLFF).

Example

example of Si